Evolution dynamics in terraced NK landscapes 



Paolo Sibani and Andreas Pedersen 
Fysisk Institut, SDU-Odense Universitet 
Campusvej 55, DK5230 Odense M, Denmark 

February 4, 2008 



Abstract 

We consider populations of agents evolving in the fitness landscape of an 
extended NK model with a tunable amount of neutrality. We study the statis- 
tics of the jumps in mean population fitness which occur in the 'punctuated 
equilibrium' regime and show that, for a wide range of landscapes parameters, 
the number of events in time t is Poisson distributed, with the time parameter 
replaced by the logarithm of time. This simple log-Poisson statistics likewise 
describes the number of records in any sequence of t independently generated 
random numbers. The implications of such behavior for evolution dynamics are 
discussed. 

PACS numbers: 87.10.+e, 87.15.Aa, 05.40.-a 

Introduction. Evolutionary dynamics can be described as a stochastic process un- 
folding in a 'fitness landscape' (l), a process which can be simulated by means of 
genetic algorithms ||. The dynamical behavior of such algorithms is controlled by 
a number of parameters]^] as e.g. the population size, the rate of mutation and the 
strenght of selection. 

A pervasive dynamical regime is the so-called 'punctuated equilibrium' or 'epochal 
behavior', where relevant measures of evolution, as e.g. the mean fitness, remain con- 
stants for long periods during which the fitness distributions of the individuals is 
strongly peaked. Occasionally, a fitter mutant appears and quickly 'takes over' the 
population (see e.g. Fig. 1). Real experiments performed on bacterial colonies evolv- 
ing in a controlled environment have shown that the fitness and cell size increase at a 
decelerating rate |J. A similar slowing down is discussed by Kauffman|^| in the 'long 
jump' dynamics of the NK model, while Aranson et al.jTj] find a logarithmic growth 
of the average fitness for quasi-species evolving in a rugged fitness landscape. 

In a macroevolutionary context, Raup and SepkoskiQ suggested that the no- 
ticeable decay of the extinction rateQ might stem from the properties of an 
underlying optimization process. This idea was taken up in the 'reset' model 0,0, 
where jumps in the average fitness of populations are linked to fitness record achieved 
during evolution]!^]. Such a link between small and grand scale evolution is rather 
controversial: If macroevolutionary events are mainly driven extrinsically, e.g. by 
meteorite impacts jl3|], any patterns in the fossil record must ultimately stem from 
the mechanics of celestial bodies in the solar system[[l4|. Conversely, if, as recently 
proposed by several authors |l5| , the fossil record is mainly shaped by complex in- 
teractions within the biotic system, macro-evolutionary patterns must emerge from 
population dynamics. 

Our main interest lies in the statistical properties of the evolutionary jumps un- 
derlying fitness changes. Since the role of neutral mutations [fl6[ for evolution is well 
established, and since punctuated equilibria exist even in the absence of local fitness 
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Figure 1: The mean fitness / of a single population is shown in the main panel as a 
function of time. The right panel shows the fitness distributions at times t = 38500, 
t = 39500 and t = 40500. The simulation parameters are iV = 64, K = 31, n = 1000, 
fif = 0.5, <7f = 2.5 x 10~ 3 , (3 = 1, u = 10~ 4 and j = 10 9 , corresponding to the original 
NK model. 
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maxima^, [lTj], we chose to study evolution on 'terraced' landscapes with a tunable 
degree of neutrality. 

We find that, on average, the number of jumps taking place in time t grows propor- 
tionate to \ogt, and that the rate of events consequently decays as w 1/t. Secondly 
and most importantly, we show that the record dynamics Jl0| , |l2"[ provides a rea- 
sonable description of microevolution, with some limitations which are also outlined. 
Thirdly, we emphasize that power-law decays generically characterize the correlation 
functions of time series generated by a record driven dynamics. 

Method. Each 'genome' in a population constitutes a point in an abstract configu- 
ration space usually called a fitness landscape Ej. To construct such a land scap e we 
use an elegant prescription due to Kauffman, the widely known NK modeljl^]: We 
represent genomes as strings of N bits x = (xi,X2, ■ ■ ■ xn), each being either or 1. 
The fitness F(x) of configuration x is defined as 



where the contribution from site i, /,-, is a random function depending on xi and K 
other xi's. More precisely, /, is a random function of 2 K+1 arguments with values 
uniformly distributed in (0, 1]. We let /i/ and at be the average and spread of the 
distribution from which the fi values are generated. 

If K is zero, the change in fitness due to the change of one Xi (a point mutation) is 
of order 1 /N, and the landscape may be regarded as smooth. By way of contrast, when 
K = N — 1 a single point mutation changes all the /j's, and the landscape becomes 
'rugged'. Intermediate cases correspond, of course, to intermediate K values. 

Our version of the NK model is modified in two respects. Firstly, the sum in Eq.|l| 
is shifted by /x/ and scaled by -y/N rather than N. This ensures that the distribution 
of fitness values keeps the same variance for any value of N. More importantly, we 
introduce tunable neutrality by discretizing the fitness values into 'terraces', according 
to the formula: 

p'f-x) = riintQVN[F(x) - fj£ + jfjj) 
j 
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variable 


meaning 


values 


N 


genome length 


64 


K 


degree of ruggedness 


7, 15 and 31 


n 


population size 


1000 


a f 


1 C P 1 ' T71 l-l 1 

spread ot fi s m Fqjl] 


2.5 x 10~ 3 


M/ 


mean of jj S m Fq.|l] 


0.5 


3 


number or terraces 


10 - 10 9 w oo 


P 


reproductive selectivity 


-i 
1 


u 


mutation prob. per cloning 


10" 4 -5-10" 4 


s 


ensemble size 


s = 80 


1 


population averaged fitness 


see Fig. 1 


m 


no. of evol. steps 


see Fig. 2 


a m (t) 


sample average of m 




A 


Alogi w a m (t) 


see slopes of Figs. 2 and 3 



Table 1: The table summarizes the notation and parameter valuse used in the text. 



Here nint stands for the nearest integer function, and j denotes the number of terraces. 
For small j there are few broad terraces, while for j — ► oo the fitness values approach 
a continuum and the original NK model is regained. Accordingly, one expects that, 
as seen e.g. in Fig. 2, the effect of varying j should be stronger for small values of j. 

In the simulations, a configuration x is cloned with a probability p cx exp((3F(x)). 
With probability u « 1, the cloned string undergoes a one-point mutation at a ran- 
domly chosen locus. Finally, a random and fitness independent deletion mechanisms 
is applied, which keeps the population size fluctuating around a fixed average n. Both 
the generation / mutation part and the deletion part of the algorithm are performed 
sequentially. Subsequently, the information about the new fitness distribution is in- 
corporated in the cloning probabilities. This entire process counts as one update and 
defines the unit of time. 

Punctuated equilibrium dynamics requires that a fitter mutant be able to survive 
and spread in the population on a time scale short compared to the inverse mutation 
frequency. Therefore, u should not be too high and (3 should not be too low. Within 
these constraints u should be as high as possible, in order to have a good statistics 
within the time window of the computation. Also, too high a (3 value quenches the 
dynamics completely. These design consideration lead to the values of u and j3 used 
throughout the calculations. 

A concise description of the dynamics is provided by the average f(t) of the dis- 
tribution of fitness values through the population. Such a trajectory is shown in the 
main panel of Fig. 1 to consist of a number of flat plateaus separated by rather 
well defined jumps. The number m(t) of jumps occurring in the interval [0, i) is a 
stochastic process whose distribution can be sampled by repeating the simulations or, 
equivalently, by considering an ensemble of landscapes, where different trajectories 
are generated by independently updating each system. The ensemble average and 
variance of m are denoted by E(m) and a 2 (m) respectively, while a m (t) and v m (t) 
are the corresponding estimators. The notation describing the inputs and results of 
our simulations is summarized in Table 1. 

Results. The basic qualitative features of the data are expressed by Fig. 1: Its left 
panel shows the mean fitness / of a single population on a semilogarithmic scale. 
The right panel details the behavior close to the evolutionary jump at t rs 3 x 10 4 
by depicting the distribution of fitness values right before, during and right after 
the jump. Nearly all strings have the same fitness in the initial and final situations, 
while the transition stage features two different fitness values. Punctuated equilibrium 
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behavior and a very peaked fitness distribution are widely found in previous studies |l 
20f| as well as in our simulations. 



For a more quantitative data analysis, the statistical properties of the jumps and 
their associated waiting times must be studied. We let rrii(t) be the number of jumps 
occurred at time t in trajectory i, and consider the sample average: 



and the sample variance: 



a m (t) = £k^W (3 ) 



Vm[t) = EkMzag (4) 



as functions of time. To calculate error bars on a m and v m we need the corre- 
sponding standard deviations, which are <j(m)/^/s and A /(cr 2 (m 2 ) 4- 8E 2 (m)E(m 2 ) — 
4E(m)(E 3 (m) + E(m 3 )))/^/s + 0(1/ s). The latter relation results from straightfor- 
ward but rather tedious algebra. To lowest order in s -1 / 2 one may now replace the 
moments of m with their sample estimators. In general, the relative errors on various 
quantities of interest are of order l/\Js w 10%. 

Fig. 2 shows the average number of jumps, for a number of different parameters 
values, as a function of time. E(m) is seen to grow logarithmically, with a strong K 
dependence slope, for 'short' log-times. The leveling off noticed at large times stems 
from the fact that, as the fitness increases, fitness improvements become progressively 
smaller. Eventually, they get lost in the noise, rather than triggering a jump. At this 
point record statistics and fitness evolution must part company. 

For large t, the probability of n records in a sequence of t independently drawn 
random numbers is given bypl|: 

P l(n) . Oa*r t -K (5 ) 

This is a Poisson distribution with logt replacing the time argument. The strength 
parameter A describes the possibility that many searches for records take place inde- 
pendently and in parallel and/or the situation where records remain undetected. In 
our systems we observed that several records were indeed lost in the noise, and did 
not trigger any evolutionary event. 

A mathematically equivalent description of record dynamics is provided by the 
distribution of the variables 



A fc = logtfe - logtfc_i = log(tfc/tfc_i). (6) 

It follows from standard arguments that in a (log) Poisson process these A& are 
independent and have the common distribution: 

P(A > x) = exp(-Ax). (7) 

The empirical distribution of the A^'s is shown on a semilogarithmic scale in Fig. 3 
for K = 31 and for differing degrees of terracing. The decay of P(A > x) seems quite 
well described by an exponential for x < 5, with the tail of the distribution falling off 
more rapidly, likely due to the inevitably poor sampling of large x values in a finite 
time simulation. Banning the effect of the deviations from pure record statistics, 
A is, by Eqs. || and [?], the slope of a m (t) vs. logi as well as minus the slope of 
logP(A > x) vs x. We calculated these slopes from the data in both ways (cutting 
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Figure 2: Sample average of the number of fitness jumps, a m as a function of time t 
for landscapes with K = 31 and a : j = 10, b : j = 10 2 , c : j = 10 3 and d : j = 10 9 . 
The mutation rate is u — 5 x 10~ 4 . The insert shows the same quantity, but for 
landscapes with no terraces (j = 10 9 ) and with K = 7 (top curve), K = 15 and 
K = 31 (bottom curve). The mutation rate is here u = 10~ 4 . 




Figure 3: Distribution of log- waiting times for landscapes with K = 31 and a : j = 
10 2 , b:j= 10 3 , c : j = 10 4 , d : j = 10 5 , e : j = 10 6 and / : j = 10 9 . For graphical 
clarity the curves are vertically shifted relative to each other by a factor of four. The 
mutation rate is u = 5 x 10 -4 . 
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Figure 4: Sample variance, v m , of the number of fitness jumps versus sample average, 
a m , of the same quantity. The data are all for K = 31. Squares and filled diamonds 
are both for for unterraced landscapes (j = 10 9 ), and have mutation rates u = 10~ 4 
and u = 5 • 1CP 4 respectively. The other two data sets both have u = 5 • 10~ 4 . The 
degree of terracing is j = 10 2 (circles) and j = 10 3 (filled triangles). 



off the tails of the data), and obtained the following results, j — 10 2 , A = 0.54 (0.34); 
j = 10 3 , A = 0.49 (0.39); j = 10 4 , A = 0.58 (0.44); j = 10 5 , A = 0.54 (0.35); 
j = 10 6 , A = 0.62 (0.47); and j = 10 9 , A = 0.63 (0.39). The figures in parentheses 
stem from Fig. 3. There is a rough agreement, but certainly also considerable scatter 
in these data, with the log-wait time type of analysis yielding systematically lower 
figures. At this stage it is unclear whether the non-monotonic dependence of A on j 
seen in Fig. 3 is a real effect - or just due to a combination of statistical fluctuations 
and systematic deviations from the ideal log-Poisson behavior. 

Summarizing the results from Figs. 2 and 3, it appears that the value of K very 
strongly affects the average slope of the curves (i.e. the value of A). The degree of 
terracing might also have an effect on the slope, albeit a minor one. 

The independence of the different A& 's implied by the record statistics was tested 
by calculating the correlation coefficients Cy between Aj and Aj. In practice, we 
checked for C12, C23 and C13. As expected, the highest degree of correlation was 
found for the relatively smooth landscape with K = 7. In this case, the C values were 
close to 0.4. For K = 31 the correlation coefficients were of the order of 0.1, i.e. of 
the same order as the statistical sampling error. 

To conclude the description of our data we plot in Fig. 4 the estimated variance 
v m {t) versus the estimated average a m , for K = 31 and a number of terrace values. 
For a perfect agreement with the log-Poisson distribution the points should lie on a 
straight line of slope 1. This is close to the observed behavior, except for the highest 
values (where, on the other hand, the statistics is poorest). A similar plot for K = 15 
and K = 7 shows a systematic deviation from a straight line, with considerably less 
variance than in a purely random case. Additional details on the simulations and on 
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the genetic algorithm utilized to produce them can be found in Ref . [|22j . 

In summary, the dynamics of our evolutionary model is time inhomogeneous 
stochastic process, with a rate of events falling off as 1/t. The log-Poisson statis- 
tics describes the data best for landscapes with large K, with or without terraces. In 
this respect terraces have a minor effect on the dynamics. 

Discussion. While the shape of the fossil record certainly reflects many different 
factors fi1| , including e.g. biogeography [p3| and external perturbations of the abiotic 
environment fl3|| , the event statistics demonstrated here should have rather general 
implications for all models which do not completely dismiss the influence of population 
dynamics on macroevolution. If the evolutionary 'jumps' are the elementary events 
in any dynamics, possibly involving interactions between evolving species, using logt 
as the independent variable makes this dynamics appear as a stationary, markovian 
process, described e.g. by a master or Fokker-Planck equation. The eigenvalues 
(and eigenvectors) of the evolution equation describe the relaxation of any average 
of interest. Since the observational time window is usually narrow in log time, one 
is restricted to observing the decay of a single (or a few) relaxational mode(s). As 
an exponential function of log is a power-law, the above mechanism offers a generic 
explanation for the power-law like behavior found in several evolutionary patterns. 
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